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ABSTRACT 

User engagement refers to the amount of interaction an in¬ 
stance (e.g., tweet, news, and forum post) achieves. Ranking 
the items in social media websites based on the amount of 
user participation in them, can be used in different appli¬ 
cations, such as recommender systems. In this paper, we 
consider a tweet containing a rating for a movie as an in¬ 
stance and focus on ranking the instances of each user based 
on their engagement, i.e., the total number of retweets and 
favorites it will gain. 

For this task, we define several features which can be ex¬ 
tracted from the meta-data of each tweet. The features are 
partitioned into three categories: user-based, movie-based, 
and tweet-based. We show that in order to obtain good re¬ 
sults, features from all categories should be considered. We 
exploit regression and learning to rank methods to rank the 
tweets and propose to aggregate the results of regression and 
learning to rank methods to achieve better performance. 

We have run our experiments on an extended version of 
MovieTweeting dataset provided by ACM RecSys Challenge 
2014. The results show that learning to rank approach out¬ 
performs most of the regression models and the combination 
can improve the performance significantly. 

Categories and Subject Descriptors 

H. 2.8 [Database Management]: Data Mining; J.4 [Com¬ 
puter Applications]: Social and Behavioral Sciences 

General Terms 

Algorithm, Experimentation 
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I. INTRODUCTION 

Twitter is an online social information network which has 
become tremendously popular in the past few years |19| . 
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Millions of users are sharing rich information using social 
media sites, such as Twitter, which can be used by social 
recommender systems m- Item providers often let users 
express their opinion about an item in social networks. For 
instance, users can give a rating to each movie in Internet 
Movie Database (IMDb) websit^ and also share it in Twit¬ 
ter. This intensifies the importance of considering social 
media sites for recommendation and information filtering 
systems |3T] . 

Product rating prediction is a traditional recommender 
system problem which has been studied extensively in the 
literature [1011231 [24] . One important issue in recommender 
systems is the engagement which can be gained by the users’ 
comments/opinions. When users share their comments on 
different items, the amount of user interactions achieved by 
each comment can be used to improve the quality of recom¬ 
mender systems. In this paper, we focus on ranking these 
comments by their engagements. 

We focus on movie ratings tweeted by IMDb users in Twit¬ 
ter. Hereafter, we use the word “engagement” as the user 
interaction which is expressed by adding up the number of 
retweets and favorites a tweet has gained. Our purpose is to 
rank the tweets of each user, each containing a rating for a 
movie in IMDb, by their engagements. 

For this task, we first extract several features from the 
tweets. The features are categorized into three groups: user- 
based, movie-based, and tweet-based. It should be noted 
that the content of the tweets are hidden and there is no 
textual feature among our defined features. Then, we pro¬ 
pose two different supervised approaches in order to rank 
the tweets. The first approach tires to predict the tweets 
engagements globally. In other words, although our pur¬ 
pose is to sort the tweets of each user, we consider tweets 
of all the users together and then try to predict the tweets 
engagements. We can then extract the sorted list of each 
user from the global ranked list. Therefore, we fit regres¬ 
sion models to predict the engagement of each tweet. In the 
second approach, for each user, we rank the tweets by their 
engagement without predicting the engagements. To this 
aim, we use learning to rank approach which is extensively 
exploited in information retrieval, natural language process¬ 
ing, and recommender systems. Learning to rank methods 
rank the tweets for each user. In contrary to regression mod¬ 
els which try to predict the engagements by considering all 
the tweets together, learning to rank methods emphasize on 
maximizing an objective function for each user. According 
to the different points of view of regression and learning to 

^http://imdb.com 



rank methods, we further propose to aggregate the results 
obtained by different regression and learning to rank meth¬ 
ods to improve the performance. 

In the experiments, we use an extended version of Movi- 
eTweetings dataset [5] provided by ACM RecSys Challenge 
2014 and report the results of a number of state-of-the-art 
regression and learning to rank methods, separately. We fur¬ 
ther discuss the aggregation of the results of these two ap¬ 
proaches. The experimental results show that although the 
results of regression methods are not so impressive, aggre¬ 
gation of regression and learning to rank methods improves 
the results significantly. 

2. RELATED WORK 

The problem of engagement prediction or online participa¬ 
tion has been studied from different points of view in news 
websites, social networks, and discussion forums. Several 
machine learning algorithms have been used in the litera¬ 
ture for this task. 

To address the problem of engagement prediction, several 
features have been proposed for training a model. Suh et 
al. [2H] have provided an analysis on the factors impacting 
the number of retweets. They have concluded that hashtags, 
number of followers, number of followees, and the account 
age play important roles in increasing the probability of the 
tweets to be retweeted. Zaman et al. [31] have trained 
a probabilistic collaborative filtering model to predict the 
future retweets using the history of the previous ones. 

Linear models have been used in some other studies to 
predict the popularity of videos on YouTube by observing 
their popularity after regular periods [23]. Petrovic et al. 
|26| have proposed a passive-aggressive algorithm to predict 
whether a tweet will be retweeted or not. 

Recognizing popular messages is also one of the similar 
problems which is used for breaking news detection and 
personalized tweet/content recommendation. Hong et al. 
m have formulated this task as a classification problem 
by exploiting content-based features, temporal information, 
meta-data of messages, and the users social graph. 

Predicting the extent to which a news is going to be break¬ 
ing or how many comments a news is going to gain is one of 
the engagement prediction problems. Tatar et al. |30| have 
analyzed a news dataset to address this problem. They have 
focused on sorting the articles based on their future popu¬ 
larity and they have proposed to use linear regression for 
this task. 

It is worth noting that ranking instances is one of the 
problems which has been extensively studied in information 
retrieval, natural language processing, and machine learning 
fields [2T]. To solve a similar problem, Uysal and Croft |31| 
have proposed “Coordinate Ascent learning to rank” algo¬ 
rithm to rank tweets for a user in a way that tweets which 
are more likely to be retweeted come on top. They have also 
worked on ranking users for a tweet in a way that the higher 
the rank, the more likely the given tweet will be retweeted. 
Several learning to rank algorithms have been proposed in 
the literature. Moreover, there are some supervised and un¬ 
supervised ensemble methods to aggregate different rank¬ 
ings, such as Borda Count [2] and Cranking [20]. Previous 
studies show that in many cases, ranking aggregation meth¬ 
ods outperform single ranking methods [8l[2T]. 


3. METHODOLOGY 

In general, our idea is to extract a number of features 
for each tweet and then try to learn machine learning based 
models on the training data. Then, for each user in test data, 
we apply the learned model to rank his/her tweets based on 
their engagements. In this section, we first introduce the 
features, and then we propose some machine learning ap¬ 
proaches to rank the tweets based on their engagements. We 
also try to aggregate the results of these different techniques 
to improve the performance. In the following subsections, 
we explain our methodology in details. 

3.1 Features 

Each tweet contains the opinion of a user about a specific 
movie. We partition the features extracted from each tweet 
into three different categories: user-based, movie-based, and 
tweet-based features. Overall, we extract several features 
from each tweet T tweeted by user U about movie M. User- 
based features give us some information about the user who 
has tweeted his/her opinion about a specific movie. These 
features are not tweet-specific and they are equal for all 
tweets of each user. The total number of followers of U is an 
example of user-based features. Movie-based features only 
include information about movie M, e.g., the total number of 
tweets about movie M. Tweet-based features contain specific 
information of tweet T. This information may also contain 
the opinion of user U about movie M. The time and language 
of a tweet are two examples of tweet-based features. 

The name and description of the extracted features are 
shown in Table [T] These features are extracted for each 
tweet T. We specify the category of the features and also 
their type; “N”, “C”, and “B” are used for numerical, cate¬ 
gorical, and boolean types, respectively. It should be noted 
that the feature values are normalized using z-score normal¬ 
ization method. 

We also perform feature selection to improve the perfor¬ 
mance and also to analyse the effectiveness of the proposed 
features. We exploit backward elimination for feature se¬ 
lection. The bolded features in Table [T] are those that are 
retained after performing feature selection. We discuss the 
selected features in Subsection o 

3.2 Machine Learning Techniques for User En¬ 
gagement Ranking 

In this subsection, we propose two different learning based 
approaches to rank the tweets of each user based on their 
engagements. The first approach is predicting the engage¬ 
ment of tweets, globally. In other words, for predicting the 
engagement of tweets of a user, we consider the tweets of all 
users for training the model and not only the tweets of the 
user. To this aim, we use regression models to predict the 
engagement of each tweet. The next approach is to rank the 
tweets for each user without predicting their engagements. 
We exploit learning to rank methods to rank the tweets of 
each user, which focus on ranking the tweets of each user 
individually and try to maximize a given objective function 
for each user. Finally, we propose a supervised method to 
aggregate the regression and learning to rank results using 
supervised Kemeny approach [T]. In the following, we ex¬ 
plain our proposed methods in details. 


Table 1: Extracted features from each tweet T tweeted by user U about movie M 


Cat. 

Feature Name 

Type 

Description 


Number of follow¬ 
ers 

N 

The total number of users who are following user U in Twitter. 


Number of followees 

N 

The total number of users who are followed by user U in Twitter. 


Number of tweets 

N 

The total number of tweets written by user U. 


Number of IMDb 

tweets 

N 

The total number of tweets tweeted by user U using IMBD about different 
movies. 


Average of ratings 

N 

The average of ratings provided by user U about different movies in IMDb. 


Number of liked 

tweets 

N 

The total number of tweets which are liked by user U. 

-Q 

Number of lists 

N 

The total number of Twitter lists which user U is involved in. 


Tweeting frequency 

N 

The frequency of tweets written by user U in each day. 


Attracting followers 
frequency 

N 

The frequency of attracting followers per day. This feature is calculated by 
dividing the total number of followers by the membership age of user U in 
Twitter in terms of number of days. 


Following frequency 

N 

The frequency of following different users by user U per day. 


Like frequency 

N 

The frequency of liking tweets by user U per day. 


Followers/Followees 

N 

The total number of followers of user U divided by the total number of his/her 
followees. 


Followers- 

Followees 

N 

The difference between the total number of followers and followees of user U. 

CO 

oj 

Number of tweets 
about M 

N 

The total number of tweets tweeted using IMDb about movie M. This feature 
shows how much movie M is rated by different users around the world in IMDb. 

1 

Average rating of 
M 

N 

The average of ratings reported by different users for movie M. 


Rate 

N 

The rating provided by user U for movie M. This rating is a positive integer 
up to 10. 


Mention count 

N 

The total number of people who are mentioned in tweet T. 


Number of hash-tags 

N 

The total number of hash-tags used in tweet T. 


Tweet age 

N 

The age of tweet T in terms of number of days. 


Membership age un¬ 
til now 

N 

The number of days from when user U registered in Twitter until when tweet 

T is tweeted. 

a; 

CO 

cd 

opinion difference 

N 

The difference between the rate tweeted by user U for movie M and the average 
of rates given by different users about movie M. 

% 

H 

Hour of tweet 

C 

The hour when tweet T is tweeted. This feature is an integer between 0 and 
23. 


Day of tweet 

C 

The day of week which tweet T is tweeted. 


Time of tweet 

c 

The part of the day that tweet T is tweeted. We have partitioned each day 
into four parts. 


Holidays or not 

B 

This feature give us whether tweet T is tweeted on holidays or not. 


Same language or 
not 

B 

This feature illustrates whether tweet T is tweeted in the same language as 
the default language of user U or not. 


English or not 

B 

This feature tells us whether tweet T is tweeted in English or not. 




























































































3.2.1 Regression 

To rank the tweets of each user based on their possible 
engagements, we can first predict the engagement of each 
tweet and then sort the tweets by their predicted values. 
To predict the engagements, we propose to train regression 
models by using the features defined in Subsection [371] as the 
features and the engagements as the labels. Then, we apply 
the learned model on the same extracted features from the 
test set. 

To create the regression model, we exploit Extremely Ran¬ 
domized Trees (also known as Extra-Trees) |TT], Bayesian 
Ridge Regression ^22], and Stochastic Gradient Descent Re¬ 
gression (SGDR) [3]. Extra-Trees are tree-based ensemble 
regression methods which are successfully used in several 
tasks. In Extra-Trees, when a tree is built, the node split¬ 
ting step is done randomly by choosing the best split among 
a random subset of features. The results of all trees are com¬ 
bined by averaging the individual predictions. SGDR is a 
generalized linear regression model that tries to fit a linear 
model by minimizing a regularized empirical loss function 
using gradient descent technique. 

3.2.2 Learning to Rank 

Instead of predicting the exact engagements, we can rank 
the tweets directly, without predicting the engagements of 
each tweet. Learning to Rank (LTR) methods are machine 
learning techniques which try to solve ranking problems m- 
LTR methods have been widely used in many different areas 
such as information retrieval, natural language processing, 
and recommender systems [HIH]. LTR methods train a 
ranking model and use the learned model to rank the in¬ 
stances using several features which are extracted from each 
instance. 

To build our LTR model, we consider a number of ranking 
algorithms which are among state-of-the-art in many test 
collections: ListNet [7], RankingSVM [15], AdaRank |33 |. 
RankNet |^, LambdaRank [^, and ListMLE [32]. ListNet 
is a probabilistic listwise approach to solve ranking prob¬ 
lems, which exploits a parameterized Plackett-Luce model to 
compute different permutations. Ranking SVM is a pairwise 
ranking approach which uses SVM classifier in its core com¬ 
putations. The basic idea behind AdaRank is constructing 
some weak rankers and combining them linearly to achieve 
a better performance. Although, Ranking SVM creates a 
ranking model by minimizing the classification error on in¬ 
stance pairs, AdaRank tries to minimize the loss function 
which is directly defined as an evaluation measure (such as 
NDCG@10). RankNet is one of the pairwise methods that 
adopts cross entropy as the loss function. RankNet employs 
a three layered neural network with a single output node 
to compare each pairs. LambdaRank is one of the ranking 
algorithms inspired by RankNet which uses Gradient De¬ 
scent approach to optimize the evaluation measure. Similar 
to ListNet, ListMLE is a probabilistic listwise approach to 
rank instances by maximizing a logarithmic loss function. 

3.2.3 Aggregating Regression and Learning to Rank 
Outputs 

According to the aforementioned facts, regression and learn¬ 
ing to rank techniques take two different points of view into 
consideration and their results might be totally different. 
Therefore, by aggregating their results, the performance can 
potentially be increased. 


To aggregate all the mentioned regression and learning 
to rank results, we use supervised Kemeny approach [1]. 
Kemeny optimal aggregation m tries to minimize total 
number of pairwise disagreements between the final rank¬ 
ing and the outputs of all base rankers. In other words, if 
ri, r 2 , ..., r„ represent the outputs of n different rankers, 
the final ranking r* is computed as: 

n 

r* = arg max {^fc(r, n)} 

^ i=l 

where k(a, 13) is the Kendall tau distance [18j measured as: 
|(i, j) : i < j, ai > aj A ft < ft'| 

where ai denotes the position of ranking a. 

While in Kemeny optimal aggregation all the rankers have 
the same importance, supervised Kemeny approach assumes 
that there is a weight for each ranker. In more details, in 
supervised Kemeny instead of counting the number of dis¬ 
agreements, we use the following equation to compute the 
final ranking: 

n 

r* = arg max {E k{r, ri) * Wi} 

i=l 

where Wi denotes the weight of ranker. To find the weight 
of each ranker, we propose to perform a Randomized Search 
[3]. To this aim, we perform cross validation over training 
data and find the optimal weight for each ranker. 

4. EXPERIMENTS 

In the experiments, we consider an extended version of 
MovieTweetings dataset^ which is provided by ACM Rec- 
Sys Challenge 2014 [27] □ The dataset contains movie rat¬ 
ings which are automatically tweeted by the users of IMDb 
iOS application. The reported results throughout this work 
are those obtained on the test set. The evaluation mea¬ 
sure is the mean of normalized discounted cumulative gain 
|14| computed for top 10 tweets of each user. We call it 
NDCGQIO, hereafter. 

In our experiments, we used Scikit-Iearn library [2S] for 
all the regression and feature selection algorithms. To se¬ 
lect the parameters of the learning methods, we performed 
hyper-parameter optimization using Randomized Search [3] 
with 5-fold cross validation. For the learning to rank algo¬ 
rithms except AdaRank, we exploited an open source pack¬ 
age, named ToyBox-Rankin^. For AdaRank, we used the 
software developed in Microsoft Research |33|n 

4.1 Experimental Results and Discussion 

In this subsection, we report and discuss the results of 
different regression and learning to rank methods. We also 
provide the results obtained by aggregating the regression 
and learning to rank results using the supervised Kemeny 
approach. 

To show the impact of feature selection, we report the 
results of regression and learning to rank methods both be¬ 
fore and after featnre selection. As mentioned before, the 
bolded features in Table [T] are those retained after perform¬ 
ing backward elimination method. The selected features are 

^http://2014.recsyschallenge.com/ 
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Table 2: Regression results with and without feature 
selection _ 



NDCG@10 

REG method 

REG w/ FS 

REG w/o FS 

XT 

0.7441384724 

0.7863435909 

BRR 

0.7541443109 

0.7759180414 

SGDR 

0.7507494314 

0.8168741812 


Table 3: Learning to rank results with and without 
feature selection_ 



NDCG@10 

LTR method 

LTR w/ FS 

LTR w/o FS 

ListNet 

0.8243394623 

0.8190048552 

RankingSVM 

0.8225893034 

0.8169257071 

AdaRank 

0.8182340058 

0.8153622186 

RankNet 

0.8223464432 

0.8169752826 

LambdaRank 

0.8209622031 

0.8126243442 

ListMLE 

0.8217342257 

0.8174866943 


diffused among all the three feature categories. This shows 
the importance of using a combination of different kinds of 
features in this problem. The selected user-based features 
show how active and popular the user is in Twitter. Inter¬ 
estingly, all the boolean features are selected and none of the 
categorical features are retained. The reason may be that 
the values of the boolean features are constant and the dif¬ 
ference between them are not a continuous value. So it may 
be easier and more efficient to use these features. Moreover, 
for the categorical features, we assign a number to each pos¬ 
sible category and the arithmetic difference between these 
numbers is not informative. 

Table [2] shows the results obtained by different regres¬ 
sion algorithms, in terms of NDCG@10. In Table [21 “XT”, 
“BRR”, and “SGDR” respectively denote Extremely Ran¬ 
domized Trees, Bayesian Ridge Regression, and Stochastic 
Gradient Descent Regression. 

The results reported in Table [2] demonstrate that fea¬ 
ture selection does not help with regression algorithms. In 
other words, after performing the feature selection, the re¬ 
sults of regression models are dropped dramatically. This 
shows that backward elimination is not sufficient for regres¬ 
sion models. According to Table (21 there is a considerable 
difference between the results achieved by different regres¬ 
sion models. 

TableOshows the results of using several learning to rank 
methods. The results also include NDCG@10 before and 
after applying feature selection. The results reported in Ta¬ 
ble [3] emphasize on the importance of using feature selection 
in learning to rank methods; since after performing feature 
selection, the results are improved. Therefore, backward 
elimination method works well for LTR methods. Table [3] 
demonstrates that ListNet performs better than the other 
LTR methods. Gomparing the results of Table (21 and Table 
O shows that all the learning to rank methods outperform 
all the regression models. 


Table 4: Ranking aggregation results 



NDGG@10 

LTRs 

0.8242044953 

REGs 

0.8063031984 

LTRs-l-REGs 

0.8261454943 


Table [4] represents the results obtained by aggregating 
the mentioned regression and learning to rank results us¬ 
ing supervised Kemeny approach. To show the importance 
of considering both regression and learning to rank methods 
together, we also report the results achieved by aggregating 
all the LTR methods and all the regression methods, sepa¬ 
rately. Tabled] indicates that although most of the results of 
regression models are far lower than the LTR methods, their 
aggregation improves the results. It shows that aggregating 
regression and learning to rank methods achieves better re¬ 
sults in comparison with aggregating only LTR methods or 
regression models. To show that this improvement is signif¬ 
icant, we performed 10-fold cross validation over the train¬ 
ing data and conducted a statistical significant test {t-test) 
on the improvements of LTRs-|-REGs over the other meth¬ 
ods. The results show that the improvement achieved by 
LTRs-l-REGs is statistically significant (p — value < 0.01). 

5. CONCLUSIONS 

In this paper, to rank the tweets of each user based on 
their engagements, we first defined several features parti¬ 
tioned into three different categories: user-based, movie- 
based, and tweet-based. We showed that after perform¬ 
ing feature selection, the features are selected from all of 
these categories. Then, we exploited regression and learning 
to rank methods to rank the tweets of each user by their 
engagements. Finally, we aggregated the results of all the 
regression and learning to rank methods using supervised 
Kemeny approach. 

We evaluated our methods on an extended version of Movi- 
eTweeting dataset provided by ACM RecSys Challenge 2014. 
The experimental results demonstrate that feature selection 
signihcantly affects the performance. The results also show 
that however the results of most regression models are far 
lower than learning to rank methods, their aggregation im¬ 
proves the performance. 
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